Document summarisation based on sentence ranking using vector space model

نویسندگان

Namita Gupta

P. C. Saxena

J. P. Gupta

چکیده

WWW is a repository of large collection of information available in the form of unstructured documents. It is a challenging task to select the documents of interest from such a huge document pool. To fasten the process of document retrieval, text summarization technique is used. Ranking of documents is made based on the summary or the abstract provided by the authors of the document. But it is not always possible as not all documents come with an abstract or summary. Also when different summarization tools are used to summarize the document, not all the topics covered within the document are reflected in its summary. In this chapter, a method to automate the process of text document summarization is proposed based on the term frequency within the document at different levels – paragraph and sentence. To summarize the document, similarity between the paragraphs and sentences within the paragraph is considered using Vector Space Model. Proposed system evaluation on the standard reference corpus from DUC-2002 using the ROUGE package indicates comparable avg. Recall, avg. Precision and avg. Fmeasure to existing summarization tools – Copernic, SweSum, Extractor, MSWord AutoSummarizer, Intelligent, Brevity, Pertinence taking DUC-2002 (100 words) human summary as baseline summary.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...

متن کامل

Combining a mixture language model and Naive Bayes for multi-document summarisation

The TNO system for multi-document summarisation is based on an extraction approach. We combined two statistical methods for sentence selection with a variant of the MMR algorithm. After sentence segmentation, each sentence is scored on the basis of two probabilistic models. The first model scores sentences based on a (generative) unigram language model, which is a mixture of a cluster model, a ...

متن کامل

Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network

Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Processing and Information Retrieval. We introduce a model that is able to represent the meaning of documents by embedding them in a low dimensional vector space, while preserving distinctions of word and sentence order crucial for capturing nuanced...

متن کامل

Multi-Document Summarisation Using Generic Relation Extraction

Experiments are reported that investigate the effect of various source document representations on the accuracy of the sentence extraction phase of a multidocument summarisation task. A novel representation is introduced based on generic relation extraction (GRE), which aims to build systems for relation identification and characterisation that can be transferred across domains and tasks withou...

متن کامل

Opinion-aware information management : statistical summarisation and knowledge representation of opinions

Nowadays, an increasing amount of media platforms provide the users with opportunities for sharing their opinions about products, companies or people. In order to support users accessing opinion-based information, and to support engineers building systems that require opinionaware reasoning, intelligent opinion-aware tools and techniques are needed. This thesis contributes methods and technolog...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IJDMMM

دوره 5 شماره

صفحات -

تاریخ انتشار 2013

Document summarisation based on sentence ranking using vector space model

نویسندگان

چکیده

منابع مشابه

مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

Combining a mixture language model and Naive Bayes for multi-document summarisation

Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network

Multi-Document Summarisation Using Generic Relation Extraction

Opinion-aware information management : statistical summarisation and knowledge representation of opinions

عنوان ژورنال:

اشتراک گذاری